Konya Province
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Government (1.00)
- Leisure & Entertainment > Games > Computer Games (0.46)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Austria > Vienna (0.14)
- Oceania (0.05)
- (29 more...)
- Law (1.00)
- Government (0.93)
On the Role of Calibration in Benchmarking Algorithmic Fairness for Skin Cancer Detection
Dominique, Brandon, Lam, Prudence, Kurtansky, Nicholas, Weber, Jochen, Kose, Kivanc, Rotemberg, Veronica, Dy, Jennifer
Artificial Intelligence (AI) models have demonstrated expert-level performance in melanoma detection, yet their clinical adoption is hindered by performance disparities across demographic subgroups such as gender, race, and age. Previous efforts to benchmark the performance of AI models have primarily focused on assessing model performance using group fairness metrics that rely on the Area Under the Receiver Operating Characteristic curve (AUROC), which does not provide insights into a model's ability to provide accurate estimates. In line with clinical assessments, this paper addresses this gap by incorporating calibration as a complementary benchmarking metric to AUROC-based fairness metrics. Calibration evaluates the alignment between predicted probabilities and observed event rates, offering deeper insights into subgroup biases. We assess the performance of the leading skin cancer detection algorithm of the ISIC 2020 Challenge on the ISIC 2020 Challenge dataset and the PROVE-AI dataset, and compare it with the second and third place models, focusing on subgroups defined by sex, race (Fitzpatrick Skin Tone), and age. Our findings reveal that while existing models enhance discriminative accuracy, they often over-diagnose risk and exhibit calibration issues when applied to new datasets. This study underscores the necessity for comprehensive model auditing strategies and extensive metadata collection to achieve equitable AI-driven healthcare solutions. All code is publicly available at https://github.com/bdominique/testing_strong_calibration.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Oceania > Australia > Queensland > Brisbane (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Dermatology (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (0.92)
- Government > Regional Government > North America Government > United States Government (0.67)
Computational Analysis of Conversation Dynamics through Participant Responsivity
Hughes, Margaret, Roy, Brandon, Poole-Dayan, Elinor, Roy, Deb, Kabbara, Jad
Growing literature explores toxicity and polarization in discourse, with comparatively less work on characterizing what makes dialogue prosocial and constructive. We explore conversational discourse and investigate a method for characterizing its quality built upon the notion of ``responsivity'' -- whether one person's conversational turn is responding to a preceding turn. We develop and evaluate methods for quantifying responsivity -- first through semantic similarity of speaker turns, and second by leveraging state-of-the-art large language models (LLMs) to identify the relation between two speaker turns. We evaluate both methods against a ground truth set of human-annotated conversations. Furthermore, selecting the better performing LLM-based approach, we characterize the nature of the response -- whether it responded to that preceding turn in a substantive way or not. We view these responsivity links as a fundamental aspect of dialogue but note that conversations can exhibit significantly different responsivity structures. Accordingly, we then develop conversation-level derived metrics to address various aspects of conversational discourse. We use these derived metrics to explore other conversations and show that they support meaningful characterizations and differentiations across a diverse collection of conversations.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Middle East > Republic of Türkiye > Konya Province > Konya (0.04)
- North America > United States > Texas (0.04)
- (13 more...)
- Government (0.93)
- Health & Medicine (0.67)
Stable-by-Design Neural Network-Based LPV State-Space Models for System Identification
Sertbaş, Ahmet Eren, Kumbasar, Tufan
Accurate modeling of nonlinear systems is essential for reliable control, yet conventional identification methods often struggle to capture latent dynamics while maintaining stability. We propose a \textit{stable-by-design LPV neural network-based state-space} (NN-SS) model that simultaneously learns latent states and internal scheduling variables directly from data. The state-transition matrix, generated by a neural network using the learned scheduling variables, is guaranteed to be stable through a Schur-based parameterization. The architecture combines an encoder for initial state estimation with a state-space representer network that constructs the full set of scheduling-dependent system matrices. For training the NN-SS, we develop a framework that integrates multi-step prediction losses with a state-consistency regularization term, ensuring robustness against drift and improving long-horizon prediction accuracy. The proposed NN-SS is evaluated on benchmark nonlinear systems, and the results demonstrate that the model consistently matches or surpasses classical subspace identification methods and recent gradient-based approaches. These findings highlight the potential of stability-constrained neural LPV identification as a scalable and reliable framework for modeling complex nonlinear systems.
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- North America > United States (0.04)
- (3 more...)
Prompting is not Enough: Exploring Knowledge Integration and Controllable Generation
Shen, Tingjia, Wang, Hao, Qin, Chuan, Sun, Ruijun, Song, Yang, Lian, Defu, Zhu, Hengshu, Chen, Enhong
Open-domain question answering (OpenQA) represents a cornerstone in natural language processing (NLP), primarily focused on extracting answers from unstructured textual data. With the rapid advancements in Large Language Models (LLMs), LLM-based OpenQA methods have reaped the benefits of emergent understanding and answering capabilities enabled by massive parameters compared to traditional methods. However, most of these methods encounter two critical challenges: how to integrate knowledge into LLMs effectively and how to adaptively generate results with specific answer formats for various task situations. To address these challenges, we propose a novel framework named GenKI, which aims to improve the OpenQA performance by exploring Knowledge Integration and controllable Generation on LLMs simultaneously. Specifically, we first train a dense passage retrieval model to retrieve associated knowledge from a given knowledge base. Subsequently, we introduce a novel knowledge integration model that incorporates the retrieval knowledge into instructions during fine-tuning to intensify the model. Furthermore, to enable controllable generation in LLMs, we leverage a certain fine-tuned LLM and an ensemble based on text consistency incorporating all coherence, fluency, and answer format assurance. Finally, extensive experiments conducted on the TriviaQA, MSMARCO, and CMRC2018 datasets, featuring diverse answer formats, have demonstrated the effectiveness of GenKI with comparison of state-of-the-art baselines. Moreover, ablation studies have disclosed a linear relationship between the frequency of retrieved knowledge and the model's ability to recall knowledge accurately against the ground truth. Our code of GenKI is available at https://github.com/USTC-StarTeam/GenKI
- Europe > Ukraine > Kyiv Oblast > Chernobyl (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- (12 more...)
Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset
Zhang, Lily Hong, Milli, Smitha, Jusko, Karen, Smith, Jonathan, Amos, Brandon, Bouaziz, Wassim, Revel, Manon, Kussman, Jack, Sheynin, Yasha, Titus, Lisa, Radharapu, Bhaktipriya, Yu, Jane, Sarma, Vidya, Rose, Kris, Nickel, Maximilian
How can large language models (LLMs) serve users with varying preferences that may conflict across cultural, political, or other dimensions? To advance this challenge, this paper establishes four key results. First, we demonstrate, through a large-scale multilingual human study with representative samples from five countries (N=15,000), that humans exhibit significantly more variation in preferences than the responses of 21 state-of-the-art LLMs. Second, we show that existing methods for preference dataset collection are insufficient for learning the diversity of human preferences even along two of the most salient dimensions of variability in global values, due to the underlying homogeneity of candidate responses. Third, we argue that this motivates the need for negatively-correlated sampling when generating candidate sets, and we show that simple prompt-based techniques for doing so significantly enhance the performance of alignment methods in learning heterogeneous preferences. Fourth, based on this novel candidate sampling approach, we collect and open-source Community Alignment, the largest and most representative multilingual and multi-turn preference dataset to date, featuring almost 200,000 comparisons from annotators spanning five countries. We hope that the Community Alignment dataset will be a valuable resource for improving the effectiveness of LLMs for a diverse global population.
- Europe > Austria > Vienna (0.13)
- Asia > India (0.04)
- South America > Brazil (0.04)
- (24 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (0.92)
- (4 more...)
- Water & Waste Management > Water Management > Water Supplies & Services (1.00)
- Media > Music (1.00)
- Materials > Chemicals > Agricultural Chemicals (1.00)
- (17 more...)
Attention to Non-Adopters
Zhou, Kaitlyn, Gligorić, Kristina, Cheng, Myra, Lam, Michelle S., Raman, Vyoma, Aminu, Boluwatife, Woo, Caeley, Brockman, Michael, Cha, Hannah, Jurafsky, Dan
Although language model-based chat systems are increasingly used in daily life, most Americans remain non-adopters of chat-based LLMs -- as of June 2025, 66% had never used ChatGPT. At the same time, LLM development and evaluation rely mainly on data from adopters (e.g., logs, preference data), focusing on the needs and tasks for a limited demographic group of adopters in terms of geographic location, education, and gender. In this position paper, we argue that incorporating non-adopter perspectives is essential for developing broadly useful and capable LLMs. We contend that relying on methods that focus primarily on adopters will risk missing a range of tasks and needs prioritized by non-adopters, entrenching inequalities in who benefits from LLMs, and creating oversights in model development and evaluation. To illustrate this claim, we conduct case studies with non-adopters and show: how non-adopter needs diverge from those of current users, how non-adopter needs point us towards novel reasoning tasks, and how to systematically integrate non-adopter needs via human-centered methods.
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > United States > Oklahoma (0.04)
- North America > United States > Alaska (0.04)
- (12 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Overview (0.93)
- Health & Medicine > Therapeutic Area (1.00)
- Education > Educational Setting (0.93)
- Transportation (0.68)
Machine Learning and Public Health: Identifying and Mitigating Algorithmic Bias through a Systematic Review
Altamirano, Sara, Vreeken, Arjan, Ghebreab, Sennay
Machine learning (ML) promises to revolutionize public health through improved surveillance, risk stratification, and resource allocation. However, without systematic attention to algorithmic bias, ML may inadvertently reinforce existing health disparities. We present a systematic literature review of algorithmic bias identification, discussion, and reporting in Dutch public health ML research from 2021 to 2025. To this end, we developed the Risk of Algorithmic Bias Assessment Tool (RABA T) by integrating elements from established frameworks (Cochrane Risk of Bias, PROBAST, Microsoft Responsible AI checklist) and applied it to 35 peer-reviewed studies. Our analysis reveals pervasive gaps: although data sampling and missing data practices are well documented, most studies omit explicit fairness framing, subgroup analyses, and transparent discussion of potential harms. In response, we introduce a four-stage fairness-oriented framework called ACAR (A wareness, Conceptualization, Application, Reporting), with guiding questions derived from our systematic literature review to help researchers address fairness across the ML lifecycle. We conclude with actionable recommendations for public health ML practitioners to consistently consider algorithmic bias and foster transparency, ensuring that algorithmic innovations advance health equity rather than undermine it.
- North America > United States (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- North America > Canada (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Public Health (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.68)